

# Department of Computer Science & Engineering Microprocessor & Computer Architecture

# **UNIT 2 Question and answers**

**TOPIC: PIPELINE AND HAZARDS** 

- 1. Instruction execution in a processor is divided into 5 stage, *Instruction Fetch* (IF), *Instruction decode* (ID), *Operand Featch* (OF), *Execute* (EX), and *Write Back* (WB). These stages take 5, 4, 20, 10, and 3 nanoseconds (ns) respectively. A pipelined implementation of the processor requires buffering between each pair of consecutive stages with a delay of 2 ns. Two pipelined implementations of the processor are contemplated;
- (i) a naive pipeline implementation (NP) with 5 stages and
- (ii) an efficiant pipeline (EP) where the OF stage is divided into stages OF1 and OF2 with execution times of 12 ns respectively.

| The   | speedup     | (correct   | to   | two   | decimal   | places) | achived | by EP | over NP in | executing | 20 |
|-------|-------------|------------|------|-------|-----------|---------|---------|-------|------------|-----------|----|
| indep | pendent ins | structions | swit | th no | hazards i | S       | •       |       |            |           |    |

#### Ans. 1.49 to 1.52

2. Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to the internal pipelined delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speed up achieved in this pipelined processor is \_\_\_\_\_.

#### Answer: 3.2

3. Consider a 6-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6-stage pipeline, the speedup achieved with respect to

.

#### Answer: 4 to 4

- 4. Consider the following processors (ns stands for nanoseconds). Assume that the pipeline registers have zero latency.
- P1: Four-stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns.
- P2: Four-stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns.
- P3: Five-stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns.
- P4: Five-stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns.

Which processor has the highest peak clock frequency?

- (A) P1<sup>O</sup>
- (B) P2<sup>O</sup>
- (C) P3<sup>O</sup>
- (D) P4

#### Answer: (c) P3

5. Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure.



What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation'?

- (A)  $4.0^{\circ}$
- (B) 2.5<sup>O</sup>
- (C) 1.1<sup>O</sup>
- (D) 3.0

**Answer** : (B) 2.5

- 6. Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns. Given latch delay is 10 ns. Calculate-
  - 1. Pipeline cycle time
  - 2. Non-pipeline execution time
  - 3. Speed up ratio
  - 4. Pipeline time for 1000 tasks
  - 5. Sequential time for 1000 tasks
  - 6. Throughput

#### Solution-

#### Given-

- Four stage pipeline is used
- Delay of stages = 60, 50, 90 and 80 ns
- Latch delay or delay due to each register = 10 ns

# Part-01: Pipeline Cycle Time-

#### Cycle time

- = Maximum delay due to any stage + Delay due to its register
- = Max { 60, 50, 90, 80 } + 10 ns
- = 90 ns + 10 ns
- = 100 ns

# Part-02: Non-Pipeline Execution Time-

# Non-pipeline execution time for one instruction

- = 60 ns + 50 ns + 90 ns + 80 ns
- = 280 ns

# Part-03: Speed Up Ratio-

#### Speed up

- = Non-pipeline execution time / Pipeline execution time
- = 280 ns / Cycle time
- = 280 ns / 100 ns
- = 2.8

#### Part-04: Pipeline Time For 1000 Tasks-

#### Pipeline time for 1000 tasks

- = Time taken for 1st task + Time taken for remaining 999 tasks
- = 1 x 4 clock cycles + 999 x 1 clock cycle
- = 4 x cycle time + 999 x cycle time
- $= 4 \times 100 \text{ ns} + 999 \times 100 \text{ ns}$
- = 400 ns + 99900 ns
- = 100300 ns

#### Part-05: Sequential Time For 1000 Tasks-

#### Non-pipeline time for 1000 tasks

- = 1000 x Time taken for one task
- $= 1000 \times 280 \text{ ns}$
- = 280000 ns

## Part-06: Throughput-

Throughput for pipelined execution

- = Number of instructions executed per unit time
- = 1000 tasks / 100300 ns
  - 7. A four stage pipeline has the stage delays as 150, 120, 160 and 140 ns respectively. Registers are used between the stages and have a delay of 5 ns each. Assuming constant clocking rate, the total time taken to process 1000 data items on the pipeline will be-
    - 1. 120.4 microseconds
    - 2. 160.5 microseconds
    - 3. 165.5 microseconds
    - 4. 590.0 microseconds

#### Solution-

#### Given-

- Four stage pipeline is used
- Delay of stages = 150, 120, 160 and 140 ns
- Delay due to each register = 5 ns
- 1000 data items or instructions are processed

#### Cycle Time-

#### Cycle time

- = Maximum delay due to any stage + Delay due to its register
- = Max { 150, 120, 160, 140 } + 5 ns
- = 160 ns + 5 ns
- = 165 ns

# Pipeline Time To Process 1000 Data Items-

Pipeline time to process 1000 data items

= Time taken for 1st data item + Time taken for remaining 999 data items

- = 1 x 4 clock cycles + 999 x 1 clock cycle
- = 4 x cycle time + 999 x cycle time
- $= 4 \times 165 \text{ ns} + 999 \times 165 \text{ ns}$
- = 660 ns + 164835 ns
- = 165495 ns
- $= 165.5 \mu s$

Thus, Option (C) is correct.

- 8. Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of 4. The same processor is upgraded to a pipelined processor with five stages but due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume there are no stalls in the pipeline. The speed up achieved in this pipelined processor is-
- 1. 3.2
  - 2. 3.0
  - 3. 2.2
  - 4. 2.0

#### Solution-

#### Cycle Time in Non-Pipelined Processor-

Frequency of the clock = 2.5 gigahertz

Cycle time

- = 1 / frequency
- = 1 / (2.5 gigahertz)
- $= 1 / (2.5 \times 10^9 \text{ hertz})$
- = 0.4 ns

#### Non-Pipeline Execution Time-

Non-pipeline execution time to process 1 instruction

- = Number of clock cycles taken to execute one instruction
- = 4 clock cycles



#### Solution-

#### Execution Time in 4 Stage Pipeline-

#### Cycle time

- = Maximum delay due to any stage + Delay due to its register
- = Max { 800, 500, 400, 300 } + 0
- = 800 picoseconds

Thus, Execution time in 4 stage pipeline = 1 clock cycle = 800 picoseconds.

# Throughput in 4 Stage Pipeline-

### Throughput

- = Number of instructions executed per unit time
- = 1 instruction / 800 picoseconds

#### Execution Time in 2 Stage Pipeline-

## Cycle time

- = Maximum delay due to any stage + Delay due to its register
- = Max { 600, 350 } + 0
- = 600 picoseconds

Thus, Execution time in 2 stage pipeline = 1 clock cycle = 600 picoseconds.

#### Throughput in 2 Stage Pipeline-

#### Throughput

- = Number of instructions executed per unit time
- = 1 instruction / 600 picoseconds

## Throughput Increase-

# Throughput increase

```
= { (Final throughput – Initial throughput) / Initial throughput } x 100
```

$$= \{ (1/600 - 1/800)/(1/800) \} \times 100$$

$$= \{ (800 / 600) - 1 \} \times 100$$

$$=(1.33-1) \times 100$$

$$= 0.3333 \times 100$$

- = 33.33 %
  - 10. Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure-



What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?

- 1. 4.0
  - 2. 2.5
  - 3. 1.1
  - 4. 3.0

#### Solution-

#### Non-Pipeline Execution Time-

Non-pipeline execution time for 1 instruction

- = 5 ns + 6 ns + 11 ns + 8 ns
- = 30 ns

# Cycle Time in Pipelined Processor-

# Cycle time

- = Maximum delay due to any stage + Delay due to its register
- = Max  $\{5, 6, 11, 8\} + 1$  ns
- = 11 ns + 1 ns
- = 12 ns

# Pipeline Execution Time-

Pipeline execution time

- = 1 clock cycle
- = 12 ns

# Speed Up-

# Speed up

- = Non-pipeline execution time / Pipeline execution time
- = 30 ns / 12 ns
- = 2.5

Thus, Option (B) is correct.

11. Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2, I3 and I4 in stages S1, S2, S3 and S4 is shown below-

|    | S1 | S2 | S3 | <b>S4</b> |
|----|----|----|----|-----------|
| I1 | 2  | 1  | 1  | 1         |
| 12 | 1  | 3  | 2  | 3         |
| 13 | 2  | 1  | 1  |           |
| I4 | 1  | 2  | 2  | 2         |

What is the number of cycles needed to execute the following loop?

- 1. 16
  - 2. 23
  - 3. 28
  - 4. 30

# Solution-

The phase-time diagram is-



**Phase-Time Diagram** 

From here, number of clock cycles required to execute the loop = 23 clock cycles.

Thus, Option (B) is correct.

- 12. Consider the following procedures. Assume that the pipeline registers have zero latency.
- P1: 4 stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns
- P2: 4 stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns
- P3: 5 stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns
- P4: 5 stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns

Which procedure has the highest peak clock frequency?

- 1. P1
  - 2. P2
  - 3. P3
  - 4. P4

#### Solution-

It is given that pipeline registers have zero latency. Thus,

Cycle time

- = Maximum delay due to any stage + Delay due to its register
- = Maximum delay due to any stage

#### For Processor P1:

#### Cycle time

- = Max { 1 ns, 2 ns, 2 ns, 1 ns }
- = 2 ns

#### Clock frequency

- = 1 / Cycle time
- = 1 / 2 ns
- = 0.5 gigahertz

#### For Processor P2:

# Cycle time

- = Max { 1 ns, 1.5 ns, 1.5 ns, 1.5 ns }
- = 1.5 ns

# Clock frequency

- = 1 / Cycle time
- = 1 / 1.5 ns
- = 0.67 gigahertz

# For Processor P3:

# Cycle time

- = Max { 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns }
- = 1 ns

# Clock frequency

- = 1 / Cycle time
- = 1 / 1 ns
- = 1 gigahertz

#### For Processor P4:

# Cycle time

- = Max { 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns }
- = 1.1 ns

# Clock frequency

- = 1 / Cycle time
- = 1 / 1.1 ns
- = 0.91 gigahertz

Clearly, Process P3 has the highest peak clock frequency.

Thus, Option (C) is correct.

13. What are the advantages and disadvantages of a single-cycle datapath?

Single-cycle datapath implementations execute each instruction in a single clock cycle, which can be slow due to the longer settling time of the circuitry (when compared with a multi-cycle datapath).